Varying input segmentation for story boundary detection in English, Arabic and Mandarin broadcast news
نویسندگان
چکیده
Story segmentation of news broadcasts has been shown to improve the accuracy of the subsequent processes such as question answering and information retrieval. In previous work, a decision tree trained on automatically extracted lexical and acoustic features was trained to predict story boundaries, using hypothesized sentence boundaries to define potential story boundaries. In this paper, we empirically evaluate several alternatives to this choice of input segmentation on three languages: English, Mandarin and Arabic. Our results suggest that the best performance can be achieved by using 250ms pause-based segmentation or sentence boundaries determined using a very low confidence score threshold.
منابع مشابه
Story Segmentation of Broadcast News in English, Mandarin and Arabic
In this paper, we present results from a Broadcast News story segmentation system developed for the SRI NIGHTINGALE system operating on English, Arabic and Mandarin news shows to provide input to subsequent question-answering processes. Using a rule-induction algorithm with automatically extracted acoustic and lexical features, we report success rates that are competitive with state-ofthe-art s...
متن کاملOnline Story Segmentation of Multilingual Streaming Broadcast News
We present an online story segmentation approach for Broadcast News (BN) that is built upon and integrated into BBN COTS multilingual Broadcast Monitoring System (BMS). We take a discriminative model-based approach, using Support Vector Machines to segment BN transcriptions into thematically coherent stories within the real-time constraints defined by BMS. We extract lexical, topical and story ...
متن کاملCombined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
This paper investigates the combined use of pause duration and pitch reset for automatic story segmentation in Mandarin broadcast news. Analysis shows that story boundaries cannot be clearly discriminated from utterance boundaries by speaker-normalized pitch reset due to its large variations across different syllable tone pairs. Instead, speakerand tonenormalized pitch reset can provide a clear...
متن کاملTwo-stage Story Segmentation and Detection on Broadcast News Using Genetic Algorithm
This paper proposes a two-stage story segmentation and detection approach on Mandarin broadcast news. In the two-stage paradigm, a topic classifier is first constructed to find the topic on the broadcast news within a sliding window and determine the potential story boundaries. Then, the problem for story segmentation is transformed to the determination of a chromosome (number sequence) in a se...
متن کاملBroadcast News Story Boundary Detection Using Visual, Audio and Text Features
News video story segmentation is vital for video summarization, story linking, and curation. We present a multimodal segmentation algorithm which fuses video, audio and text cues for story boundary detection. We show that broadcast news closed captioning is a rich and readily available source that improves story boundary detection. Furthermore, we propose an empirical distribution-based feature...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007